Abstract: The Global digital content created will increase some 30 times over the next ten years – to 35 zetta bytes, this unstoppable increase in data challenges business problems, a big data represents a large and rapidly growing volume of information that is mostly untapped by existing analytical applications and data warehousing systems. To analyze this enormous amount of data Hadoop can be used. Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. The technologies used by big data application to handle the massive data are Hadoop, Map Reduce, Apache Hive, No SQL and HPCC. This paper highlights the big data and the new method of hadoop, MapReduce and HDFS to tackle the problem of big data.

Keywords: Data Mining, Big data, Structured data, BI, Big Data analytics, OLAP, EDA, Neural Networks, Hadoop and MapReduce technique, Advantages, Disadvantages.